On Discovering Concept Entities from Web Sites

نویسندگان

Ming Yin

Dion Hoe-Lian Goh

Ee-Peng Lim

چکیده

A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An introduction to methods of discovering and identifying ancient sites with emphasis on evidence and geomorphologic techniques

Recognizing of position of ancient sites, it is of the great help to archaeologist. After this recognition, the archaeologist with rely on the knowledge and usual techniques in archaeology can determine the range of sites. After the discovery of this information, the archaeologist can get the information about the social, economic, livelihood and political of the past of sites. In this researc...

متن کامل

Discovery of Concept Entities from Web Sites using Web Unit Mining

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A Survey on Web Service Discovering and Composition

This paper reviews the existing techniques used in the discovering and composing of services. The task of selecting an adequate service can quickly grow tedious if all services that are listed under a certain description have to be compared manually for the final selection. And what is more, the final selection does not only depend on service parameters like executions costs or accuracy, but de...

متن کامل

Discovering Entity Knowledge Bases on the Web

Recognition and disambiguation of named entities in text is a knowledge-intensive task. Systems are typically bound by the resources and coverage of a single target knowledge base (KB). In place of a fixed knowledge base, we attempt to infer a set of endpoints which reliably disambiguate entity mentions on the web. We propose a method for discovering web KBs and our preliminary results suggest ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

On Discovering Concept Entities from Web Sites

نویسندگان

چکیده

منابع مشابه

An introduction to methods of discovering and identifying ancient sites with emphasis on evidence and geomorphologic techniques

Discovery of Concept Entities from Web Sites using Web Unit Mining

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

A Survey on Web Service Discovering and Composition

Discovering Entity Knowledge Bases on the Web

عنوان ژورنال:

اشتراک گذاری